尽管两阶段矢量量化(VQ)生成模型允许合成高保真性和高分辨率图像,但其量化操作员将图像中的相似贴片编码为相同的索引,从而为相似的相邻区域重复使用现有的解码器体系结构的相似相似区域的重复伪像。为了解决这个问题,我们建议将空间条件的归一化结合起来,以调节量化的向量,以便将空间变体信息插入嵌入式索引图中,从而鼓励解码器生成更真实的图像。此外,我们使用多通道量化来增加离散代码的重组能力,而无需增加模型和代码簿的成本。此外,为了在第二阶段生成离散令牌,我们采用掩盖的生成图像变压器(MaskGit)来学习压缩潜在空间中的基础先验分布,该分布比常规自动回归模型快得多。两个基准数据集的实验表明,我们提出的调制VQGAN能够大大提高重建的图像质量,并提供高保真图像的产生。
translated by 谷歌翻译
域适应(DA)从严格的理论作品中获益,研究其富有识别特征和各个方面,例如学习领域 - 不变的表示及其权衡。然而,由于多个源域的参与和训练期间目标域的潜在不可用的域,因此似乎不是这种源DA和域泛化(DG)设置的情况非常复杂和复杂。在本文中,我们为目标一般损失开发了新的上限,吸引我们来定义两种域名不变的表示。我们进一步研究了利弊以及执行学习每个领域不变的表示的权衡。最后,我们进行实验检查这些陈述的权衡,以便在实践中提供有关如何使用它们的实践提示,并探索我们发达理论的其他有趣性质。
translated by 谷歌翻译
内核分割旨在将数据序列划分为可能具有非线性和复杂结构的几个非重叠段。通常,它被称为组合约束的离散优化问题。最佳解决此问题的流行算法是动态编程(DP),它具有二次计算和内存要求。鉴于实践中的序列太长,这种算法不是一种实际方法。尽管已经提出了许多启发式算法来近似最佳分割,但他们无法保证其解决方案的质量。在本文中,我们采取了一种可区分的方法来减轻上述问题。首先,我们引入了一种新型的基于Sigmoid的正则化,以平稳近似组合约束。将其与平衡内核聚类的目标相结合,我们制定了一种用基于Sigmoid的正则化(KCSR)称为内核聚类的可区分模型,可以利用基于梯度的算法来获得最佳分段。其次,我们开发了提出模型的随机变体。通过使用具有较低时间和空间复杂性的随机梯度下降算法以进行优化,第二个模型可以对横长的数据序列进行分割。最后,为了同时分割多个数据序列,我们稍微修改了基于Sigmoid的正则化,以进一步引入所提出模型的扩展变体。通过对我们模型的各种数据序列性能进行的广泛实验,并将其与现有方法的表演进行了比较。实验结果验证了所提出的模型的优势。我们的MATLAB源代码可在GitHub上获得。
translated by 谷歌翻译
Here, we demonstrate how machine learning enables the prediction of comonomers reactivity ratios based on the molecular structure of monomers. We combined multi-task learning, multi-inputs, and Graph Attention Network to build a model capable of predicting reactivity ratios based on the monomers chemical structures.
translated by 谷歌翻译
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer.github.io
translated by 谷歌翻译
Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic features are poorly defined. Here, we present a method for improving explainability of DNN models using synthetic histology generated by a conditional generative adversarial network (cGAN). We show that cGANs generate high-quality synthetic histology images that can be leveraged for explaining DNN models trained to classify molecularly-subtyped tumors, exposing histologic features associated with molecular state. Fine-tuning synthetic histology through class and layer blending illustrates nuanced morphologic differences between tumor subtypes. Finally, we demonstrate the use of synthetic histology for augmenting pathologist-in-training education, showing that these intuitive visualizations can reinforce and improve understanding of histologic manifestations of tumor biology.
translated by 谷歌翻译
Recent 3D-based manipulation methods either directly predict the grasp pose using 3D neural networks, or solve the grasp pose using similar objects retrieved from shape databases. However, the former faces generalizability challenges when testing with new robot arms or unseen objects; and the latter assumes that similar objects exist in the databases. We hypothesize that recent 3D modeling methods provides a path towards building digital replica of the evaluation scene that affords physical simulation and supports robust manipulation algorithm learning. We propose to reconstruct high-quality meshes from real-world point clouds using state-of-the-art neural surface reconstruction method (the Real2Sim step). Because most simulators take meshes for fast simulation, the reconstructed meshes enable grasp pose labels generation without human efforts. The generated labels can train grasp network that performs robustly in the real evaluation scene (the Sim2Real step). In synthetic and real experiments, we show that the Real2Sim2Real pipeline performs better than baseline grasp networks trained with a large dataset and a grasp sampling method with retrieval-based reconstruction. The benefit of the Real2Sim2Real pipeline comes from 1) decoupling scene modeling and grasp sampling into sub-problems, and 2) both sub-problems can be solved with sufficiently high quality using recent 3D learning algorithms and mesh-based physical simulation techniques.
translated by 谷歌翻译
语义分割是开发医学图像诊断系统的重要任务。但是,构建注释的医疗数据集很昂贵。因此,在这种情况下,半监督方法很重要。在半监督学习中,标签的质量在模型性能中起着至关重要的作用。在这项工作中,我们提出了一种新的伪标签策略,可提高用于培训学生网络的伪标签的质量。我们遵循多阶段的半监督训练方法,该方法在标记的数据集上训练教师模型,然后使用训练有素的老师将伪标签渲染用于学生培训。通过这样做,伪标签将被更新,并且随着培训的进度更加精确。上一个和我们的方法之间的关键区别在于,我们在学生培训过程中更新教师模型。因此,在学生培训过程中,提高了伪标签的质量。我们还提出了一种简单但有效的策略,以使用动量模型来提高伪标签的质量 - 训练过程中原始模型的慢复制版本。通过应用动量模型与学生培训期间的重新渲染伪标签相结合,我们在五个数据集中平均达到了84.1%的骰子分数(即Kvarsir,CVC-ClinicdB,Etis-laribpolypdb,cvc-colondb,cvc-colondb,cvc-colondb和cvc-300)和CVC-300)只有20%的数据集用作标记数据。我们的结果超过了3%的共同实践,甚至在某些数据集中取得了完全监督的结果。我们的源代码和预培训模型可在https://github.com/sun-asterisk-research/online学习SSL上找到
translated by 谷歌翻译
套件是指准备和分组必要的零件和工具(或“套件”)以在制造环境中组装。自动化此过程可简化人工工人的组装任务,并提高效率。现有的自动化套件系统遵守脚本指示和预定义的启发式方法。但是,鉴于零件和逻辑延迟的可用性差异,现有系统的僵化性可以限制装配线的整体效率。在本文中,我们提出了一个双重优化框架,以使机器人能够执行基于任务分割的零件选择,套件布置和交付计划,以及时提供定制的套件 - 即在需要时正确。我们通过人类主题研究(n = 18)评估了提出的方法,涉及基于研究的数据构建平板家具桌和购物流仿真。我们的结果表明,与使用由任务图本身定义的刚性任务分割边界定义的基线方法相比,与基线方法相比,与基线方法相比,即将到来的套件系统更有效,对上游商店流量延迟有弹性,并且比较更好地优选。单个套件,包括组装单个单元所需的所有零件。
translated by 谷歌翻译
现有的最新3D点云实例分割方法依赖于基于分组的方法,该方法指向获得对象实例。尽管产生准确的分割结果方面有所改善,但这些方法缺乏可扩展性,通常需要将大量输入分为多个部分。为了处理数百万点的场景,现有的最快方法软组\ cite {vu2022222222222222222222222222222222222222ggroup}需要数十秒钟,这是满意的。我们的发现是,$ k $ neart的邻居($ k $ -nn)是分组的先决条件,是计算瓶颈。这种瓶颈严重使现场的推理时间恶化了很多。本文提出了软组++来解决此计算瓶颈,并进一步优化了整个网络的推理速度。 SoftGroup ++建立在软组上,这在三个重要方面有所不同:(1)执行OCTREE $ K $ -NN而不是Vanilla $ k $ -nn,以将时间复杂性从$ \ Mathcal {o}(n^2)缩短到$ \ Mathcal {o}(n \ log n)$,(2)执行金字塔缩放,适应性下降样本骨干输出以减少$ k $ -nn和分组的搜索空间,并且(3)执行后期的Devoxelization,延迟了Voxels的转换指向模型的结束,以使中间组件以低计算成本运行。在各种室内和室外数据集上进行了广泛的实验,证明了拟议的软组++的功效。值得注意的是,SoftGroup ++在一个前方的情况下通过单个前方进行了大量的场景,而无需将输入分为多个部分,从而丰富了上下文信息。特别是,SoftGroup ++达到2.4点AP $ _ {50} $改进,而$ 6 \ $ 6 \ times $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $。代码和训练有素的模型将公开可用。
translated by 谷歌翻译